Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

نویسندگان

چکیده

Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it still inefficient or infeasible process very using such a method in single machine. Moreover, are often distributedly collected and stored on different machines. Thus, generally bear strong heterogeneous noise. It essential useful develop distributed matrix for analytics. Such should scale up well, model noise, address communication issue system. To this end, we propose Bayesian (DBMD) mining clustering. Specifically, adopt three strategies implement computing including 1) accelerated gradient descent, 2) alternating direction multipliers (ADMM), 3) statistical inference. We investigate theoretical convergence behaviors these algorithms. heterogeneity an optimal plug-in weighted average that reduces variance estimation. Synthetic experiments validate our results, real-world show algorithms well achieves superior competing performance compared two typical methods Scalable-NMF scalable k-means++.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

the clustering and classification data mining techniques in insurance fraud detection:the case of iranian car insurance

با توجه به گسترش روز افزون تقلب در حوزه بیمه به خصوص در بخش بیمه اتومبیل و تبعات منفی آن برای شرکت های بیمه، به کارگیری روش های مناسب و کارآمد به منظور شناسایی و کشف تقلب در این حوزه امری ضروری است. درک الگوی موجود در داده های مربوط به مطالبات گزارش شده گذشته می تواند در کشف واقعی یا غیرواقعی بودن ادعای خسارت، مفید باشد. یکی از متداول ترین و پرکاربردترین راه های کشف الگوی داده ها استفاده از ر...

An Ensemble Clustering for Mining High-dimensional Biological Big Data

Clustering of high-dimensional biological big data is incredibly difficult and challenging task, as the data space is often too big and too messy. The conventional clustering methods can be inefficient and ineffective on high-dimensional biological big data, because traditional distance measures may be dominated by the noise in many dimensions. An additional challenge in biological big data is ...

متن کامل

High Performance clustering for Big Data Mining using Hadoop

Now a day, organizations across public and private sectors have made a premeditated decision to big data into competitive advantage. The motivation and challenge of extracting value from big data is similar in many ways to the age-old problem of distilling business intelligence from transactional data. Hadoop is a speedily budding ecosystem of components based on big data Map Reduce algorithm a...

متن کامل

Fast Kernel Matrix Computation for Big Data Clustering

Kernel k-Means is a basis for many state of the art global clustering approaches. When the number of samples grows too big, however, it is extremely time-consuming to compute the entire kernel matrix and it is impossible to store it in the memory of a single computer. The algorithm of Approximate Kernel k-Means has been proposed, which works using only a small part of the kernel matrix. The com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Knowledge and Data Engineering

سال: 2022

ISSN: ['1558-2191', '1041-4347', '2326-3865']

DOI: https://doi.org/10.1109/tkde.2020.3029582